![]() PRESENTATION OF COMMUNICATIONS
专利摘要:
A method for presenting communications is provided. The method may include obtaining, in a device, a request from a user for reading a stored message that includes an audio signal. In response to obtaining the request, the method may include directing the audio signal of the message to a transcription system from the device. In these and other embodiments, the transcription system may be configured to generate text that is a transcription of the audio signal in real time. The method may further include obtaining, in the device, the text of the transcription system and the presentation, by the device, of the text generated by the transcription system in real time. In response to obtaining the text from the transcription system, the method may also include the presentation by the device of the audio signal such that the presented text is substantially aligned with the audio signal. 公开号:FR3071689A1 申请号:FR1851481 申请日:2018-02-21 公开日:2019-03-29 发明作者:Brian Chevrier 申请人:Sorenson IP Holdings LLC; IPC主号:
专利说明:
Presentation of papers The embodiments described herein relate to the presentation of communications. Telephone answering machines have been used for a long time to answer telephone calls and record voice messages locally for a user. More recently, voice mail systems have been used to answer telephone calls and store voice messages remotely from a user. Voicemails are generally useful because they can be stored for as long as a user is not available, and then they can be heard at a time that is more convenient for the user. However, voice messages are less useful for a hearing-impaired user because it can be difficult, if not impossible, for the hearing-impaired user to understand the content of a voice message simply by listening to the voice message. The subject matter claimed here is not limited to embodiments which resolve any disadvantage or which operate only in environments such as those described in the background above. Indeed, this background is only given to illustrate an example of a technological field in which certain embodiments described here can be put into practice. According to embodiments of the invention, there is provided a method for presenting communications. The method may include obtaining, in a device, a request from a user to read a stored message that includes an audio signal. In response to obtaining the request, the method may include directing the audio signal of the message to a transcription system from the device. In these embodiments, as well as in others, the transcription system can be configured to generate text which is a transcription of the audio signal in real time. The method may further include obtaining, in the device, text from the transcription system and presenting, by the device, the text generated by the transcription system in real time. In response to obtaining text from the transcription system, the method may also include presenting, by the device, the audio signal in such a way that the presented text is substantially aligned with the audio signal. According to one aspect of the invention, a subtitling device is proposed, comprising: a screen ; a speaker ; a processor communicatively coupled to the screen and the speaker; and at least one computer-readable medium communicatingly connected to the processor and configured to store one or more instructions which, when executed by the processor, cause operations by the subtitling device to include: obtaining, in the captioning device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal to a transcription system from the captioning device, the transcription system being configured to generate text which is a transcription of the audio signal in real time; in response to obtaining the request, generating an audio stream to be sent to the speaker; buffering the audio stream before sending the audio stream to the speaker; obtaining, in the subtitling system, the text generated by the transcription system; the presentation, on the screen, of text from the real-time transcription system; and in response to obtaining text from the transcription system, transmitting the buffered audio stream to the speaker for presentation by the speaker such that the text is substantially aligned with the stream audio presented by the speaker. According to one embodiment, the message is stored in the computer-readable medium (s) of the subtitling device. According to one embodiment, the message is stored outside the subtitling device and the operations further include obtaining, in the subtitling device, the message via a network, in which the audio signal from the message is directed to the transcription system after being obtained by the captioning device. According to one embodiment, the message further comprises a video signal and the operations furthermore comprise, in response to obtaining the text coming from the transcription system, the presentation, on the screen, of the video signal in such a way that text is substantially aligned with audio and video signals. According to one embodiment, the operations further comprise, during an interval between obtaining the request from the user and presenting the audio signal, presenting, on the screen, a message signaling to the user a delay in the presentation of the audio signal and text. According to one embodiment, the operations further comprise determining a length in buffer memory based on a time difference between the transmission of the audio signal to the transcription system and the obtaining of the text coming from the transcription system, the audio stream. being continuously buffered based on the determined buffer length. According to one embodiment, the operations further comprise, after determining the length in the buffer memory, adjusting the length in the buffer memory based on a network connection between the transcription system and the subtitling device. According to one embodiment, the message is a first message, the operations also include: obtaining, in the captioning device, a second request from a user to read a second stored message comprising a second audio signal; in response to obtaining the second request, directing the second audio signal to a transcription system from the captioning device, the transcription system being configured to generate a second text which is a transcription of the second audio signal in real time ; buffering a second audio stream based on the second audio signal for the buffered length; and transmitting the second buffered audio stream to the speaker for presentation by the speaker regardless of the second text provided by the transcription system. According to another aspect of the invention, a method for presenting communications is proposed, the method comprising: obtaining, in a device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal of the message to a transcription system from the device, the transcription system being configured to generate text which is a transcription of the audio signal in real time; obtaining, in the device, text from the transcription system; the presentation, by the device, of the text generated by the transcription system in real time; and in response to obtaining text from the transcription system, the presentation by the device of the audio signal such that the presented text is substantially aligned with the audio signal. According to one embodiment, the message is stored in the device. According to one embodiment, the message is stored outside the device and the method further comprises obtaining, in the device, the message via a network, the audio signal of the message being directed to the transcription system after having been obtained by the device. According to one embodiment, the message further comprises a video signal and the method further comprises, in response to obtaining text from the transcription system, the presentation, by the device, of the video signal in such a way that the text is substantially aligned with audio and video signals. According to one embodiment, the method further comprises: the generation of an audio stream intended to be presented by the device; buffering the audio stream before the audio stream is presented by the device; wherein the audio signal presented by the device is the buffered audio stream such that the text is substantially aligned with the buffered audio stream presented by the device. According to one embodiment, the method further comprises determining a length in buffer memory based on a time difference between the transmission of the audio signal to the transcription system and the obtaining of the text coming from the transcription system, the audio stream. being continuously buffered based on the determined buffer length. According to one embodiment, the method further comprises, after determining the buffer length, adjusting the buffer length based on a network connection between the transcription system and the device. According to another aspect of the invention, there is provided one or more computer-readable medium (s) configured (s) to store one or more instruction (s) which, when executed by one or more processor (s), cause the device to execute the process defined above. According to another aspect of the invention, a method for presenting a communication is proposed, the method comprising: obtaining, in a device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal of the message to a transcription system, the transcription system being configured to generate text which is a transcription of the audio signal in real time; in response to obtaining the request, generating an audio stream to be presented by the device; buffering the audio stream before the audio stream is presented by the device; obtaining, in the device, text generated by the transcription system; and in response to obtaining text from the transcription system, the presentation, by the device, of the buffered audio stream and the text such that the presented text substantially aligns with the stored audio stream stamp shown. According to one embodiment, the method further comprises determining a length in buffer memory based on a time difference between the transmission of the audio signal to the transcription system and the obtaining of the text coming from the transcription system, the audio signal being continuously buffered based on the determined buffer length. According to one embodiment, the message is stored in the device. According to another aspect of the invention, a device is proposed which comprises at least one computer-readable medium configured to store one or more instructions (s) which, when executed by one or more processor (s), cause execution by the device the process defined above. Examples of embodiments will be described and explained in more detail with the aid of the appended drawings, in which: FIG. 1 represents an example of an environment for the presentation of communications; FIGS. 2A, 2B and 2C represent examples of displays which may appear on a screen of a device associated with the presentation of a communication by the device; FIG. 3 represents examples of operations for presenting a communication; FIG. 4 shows an example of a system which can be used to present a communication; Figure 5 is a flow diagram of an exemplary method for presenting a communication; and Figure 6 is a flow diagram of another example method for presenting a communication. Some embodiments of the present invention relate to devices and methods that can be configured to present communications. In some embodiments, the communications may be messages, such as video or voice messages. It may be difficult or impossible for a hearing impaired user to understand the content of a message, such as a video or voice message, simply by listening to the message. To make a message useful to a hearing-impaired user, a transcript of the message can be generated by a remote service that takes care of hearing-impaired users. One problem with transcribing a message is that the remote service may not have access to the message until the start of a transcription session in which the local device is connected to the remote service. As a result, the presentation of the message transcription and the broadcast of the audio signal of the message may not be synchronized. In these embodiments, as well as in others, transcription, when presented, may delay the broadcast of the message, such that the audio signal broadcast from the message is presented before transcription. In some embodiments, in response to receiving a request to read a message, a device can send the message to the remote service and buffer the message until transcription in real time begins to be received. The term "real-time transcription" used herein refers to the real-time processing of audio data and the generation of a transcript of audio data. After receiving the transcription, the device can broadcast the audio signal of the buffered message simultaneously with the presentation of the transcription, so that the audio signal and the transcription are substantially synchronized. Referring to the figures, Figure 1 shows an example of an environment 100 for the presentation of communications. The environment 100 can be arranged according to at least one embodiment described herein. The environment 100 may include a network 102, a first device 104, a second device 106, a transcription system 108 and a database 130. The network 102 can be configured to communicate the first device 104, the second device 106, the transcription system 108 and the database 130 in a communicating manner. In certain embodiments, the network 102 can be any network or any network configuration configured to send and receive communications between systems and devices. In certain embodiments, the network 102 can comprise a conventional type network, a wired network or a wireless network, and it can have several different configurations. In some embodiments, the network 102 can also be coupled to or can include parts of a telecommunications network, including telephone lines, to send data under different communication protocols, such as a conventional telephone system. Each of the first and second devices 104 and 106 can be any electronic or digital device. For example, each of the first and second devices 104 and 106 may include a desktop computer, a laptop computer, a smart phone (or "smartphone"), a mobile phone, a digital tablet, a telephone, a telephone console, a device captioning, a captioning telephone or any other IT device. In some embodiments, each of the first device 104 and second device 106 may include memory and at least one processor, which are configured to perform operations described herein, among other operations. In some embodiments, each of the first device 104 and second device 106 may include computer readable instructions which are configured to be executed by each of the first device 104 and second device 106 to perform the operations described herein. In some embodiments, each of the first and second devices 104 and 106 can be configured to establish communication sessions with other devices. For example, each of the first and second devices 104 and 106 can be configured to establish an outgoing communication session, such as a telephone call, video call, or other communication session, with another device via a telephone line or network. . For example, each of the first device 104 and second device 106 can communicate via a wireless cellular network, a wired Ethernet network or a conventional telephone line. As a variant or in addition, each of the first device 104 and second device 106 can communicate via other wired or wireless networks which do not understand or only partially understand a conventional telephone network. For example, a communication session between the first device 104 and the second device 106, such as a telephone call, can be a voice telephone call using Internet Protocol (VoIP). As another example, the communication session between the first device 104 and the second device 106 can be a video communication session or other communication session. Alternatively or additionally, each of the first and second devices 104 and 106 can be configured to communicate with other systems via a network, such as network 102 or another network. In these embodiments, as well as in others, each of the first device 104 and second device 106 can receive data from and send data to the transcription system 108. In some embodiments, the transcription system 108 may include any hardware configuration, such as processors, servers and database servers that are networked together and configured to perform a task. For example, transcription system 108 may include multiple computer systems, such as multiple servers which each include memory and at least one processor, which are networked together and configured to perform session captioning operations communication, such as phone calls, between devices like the second device 106 and another device described herein. In these embodiments, as well as in others, the transcription system 108 can operate to generate transcripts of an audio signal from one or more speakers in a communication session. For example, the transcription system 108 can generate transcripts of an audio signal generated by other devices and not the second device 106 or the second device 106 and other devices, among other configurations. In some embodiments, the transcription system 108 can operate as a central office configured to establish communication sessions, such as telephone calls, video calls, etc. between devices such as the second device 106 and one or more other device (s) as described herein, among other operations. In some embodiments, the transcription system 108 may include computer readable instructions which are configured to be executed by the transcription system 108 to perform the operations described herein. In addition, in some embodiments, the environment 100 can be configured to facilitate an assisted communication session between a hearing impaired user 112 and a second user, such as a user 110. As used herein, the phrase " hearing impaired user ”may mean someone with impaired hearing. Hearing impaired users often have a level of hearing ability that has generally decreased over time, so that the hearing impaired user can communicate by speaking, but the hearing impaired user often has great difficulty hearing and / or understanding others. In some embodiments, the assisted communication session can be established between the first device 104 and the second device 106. In these embodiments, the second device 106 can be configured to present transcripts from the communication session to the hearing impaired user 112. For example, the second device 106 may be a transcription telephone of the 57T range or of the 67T range from CaptionCall® or a device on which the CaptionCall® mobile application operates. For example, in some embodiments, the second device 106 may include a display screen 120, such as a touch screen or other display screen, which is integrated with the second device 106 and which is configured to present text transcripts of 'a communication session to the hearing impaired user 112. As a variant or in addition, the second device 106 may be associated with a display screen which is physically separate from the second device 106 and which is in wireless communication with the second device 106, like a display screen of a device portable 122 worn on the wrist of the hearing impaired user 112 and configured to be in Bluetooth wireless communication with the second device 106. Other physically separate screens can be display screens for desktop computers, portable computers, smartphones, mobile phones, tablets or any other electronic device that is in wireless communication with the second device 106. The second device 106 may also include a speaker 124, such as a handset speaker or a speakerphone speaker. The second device 106 may also include a processor communicatively coupled to the display screen 120 and to the loudspeaker, as well as at least one non-transient computer-readable medium coupled in communicative manner to the processor and configured to store a or several instruction (s) which, when executed by the processor, execute the message presentation methods described here, and which also locally store voice messages on the second device 106. During a communication session, the transcription system 108, the first device 104 and the second device 106 can be communicatively coupled using networking protocols. In certain embodiments, during the communication session between the first device 104 and the second device 106, the second device 106 can transmit the audio signal coming from the first device 104 to the transcription system 108. As a variant or in addition, the first device 104 can transmit the audio signal to the transcription system 108 and the transcription system 108 can relay the audio signal to the second device 106. In the transcription system 108, the audio data can be transcribed. In some embodiments, to transcribe the audio data, a transcription engine can generate a transcription of the audio signal. As a variant or in addition, a remote call assistant 114 can listen to the audio signal transmitted by the first device 104 to the transcription system 108, via the second device 106, and "revocate" the words of the user 110 at destination of a speech recognition computer program tuned to the voice of the remote call assistant 114. In these embodiments, as well as in others, the remote call assistant 114 may be a operator who acts as a human intermediary between the hearing impaired user 112 and the user 110. In some embodiments, text transcripts can be generated by a speech recognition computer as a transcription of the user's audio signal 110. After the generation of the text transcriptions, the text transcriptions can be transmitted to the second device 106 via the network 102. The second device 106 can display the text transcriptions on the display screen 120 while the hearing impaired user 112 is having a normal conversation. with the user 110. The text transcriptions can allow the hearing-impaired user 112 to complete the voice signal transmitted by the first device 104 and to confirm his understanding of the words spoken by the user 110. The transcription of a communication session occurring in real time between two devices as described above may be referred to herein as "transcription communication session". In addition to generating communications session transcripts, the environment 100 can be configured to provide communications transcripts from other devices, such as the first device 104. Communications can be messages, such as video messages or messages audio. The communications can be stored locally on the second device 106 or in a database 130. For example, the second device 106 can be configured to function as a local "answering machine" in which the user 110 who is trying to communicate, p. ex. to call, using the first device 104 can leave a message, such as a voice message or a video message, on the second device 106 when a communication request to the second device 106 does not receive a response from the hearing impaired user 112. The hearing impaired user 112 can then use the second device 106 to read the message with a real-time transcription of the voice or video message provided by the transcription system 108. As another example, when a request for communication to the second device 106 from the first device 104 does not receive a response from the hearing impaired user 112, the user 110 can leave a message for the hearing impaired user 112. This message can be stored in the database 130. The hearing impaired user 112 can then use the second device 106 to read the message with a real-time transcription of the message provided by the transcription system 108. The transcription of communications from other devices, such as messages, may be referred to herein as "transcription message session". During a transcription message session, the transcription system 108 and the second device 106 can be communicatively coupled using networking protocols. In some embodiments, during the transcription message session, the second device 106 may obtain a request to present a message. The second device 106 can obtain the message. In certain embodiments, the second device 106 can obtain the message from the database 130. As a variant or in addition, the second device 106 can obtain the message from a local memory which is part of the second device 106. After obtaining the message, the second device 106 can transmit an audio signal of the message to the transcription system 108. While the second device 106 transmits the audio data to the transcription system 108, the second device 106 can store the message buffer. For example, when the message is a video message, the second device 106 can buffer the audio and video signals of the video message and send only the audio signal to the transcription system 108. Here, the term "audio" can be used generically to refer to sounds that may include spoken words. In addition, the term "audio" can be used generically to include audio signals in all formats, such as digital, analog or sound wave format. In addition, in digital format, the audio signal can be compressed using different types of compression programs. Here, the term "video" can be used generically to refer to a compilation of images that can be reproduced in sequence to produce a video. In addition, the term "video" can be used generically to include video signals in all formats. In addition, the video signal can be compressed using different types of compression programs. In the transcription system 108, the audio signal from the message can be transcribed into text in a similar manner to the transcription performed during a transcription communication session. The text can be transmitted to the second device 106. In response to receiving the text from the transcription system 108, the second device 106 may determine a length for buffering the message. In these embodiments, as well as in others, the second device 106 can determine the length for buffering the message based on the time elapsed between sending the audio signal to the transcription system 108 and receiving the text from transcription system 108. For example, if the second device 106 sends the audio signal to transcription system 108 at time 8:01:05 and receives the first text back from transcription system 108 at time 8 : 01: 08, the length of the buffer can be three seconds. As a variant or in addition, the second device 106 can determine the length for buffering the message based on a particular length of buffer memory which was determined before the message was received. In response to receiving the text from the transcription system 108, the second device 106 can also be configured to begin presenting the message from the buffer. The presentation of the message may include broadcasting the audio signal of the message through the speaker 124 and / or displaying video data of the message through the display screen 120. The message may continue to be buffered after receiving the text from the transcription system 108 in such a way that the message is presented by the second device 106 after having been buffered. In these embodiments, as well as in others, the buffering length can be roughly equal to the time between sending the audio signal to the transcription system 108 and receiving the text from the transcription system 108 which corresponds to the audio signal. In response to receiving text from the transcription system 108, the second device 106 may also be configured to display the text on the display screen 120, in real time, while simultaneously presenting the message. Consequently, the presentation of the message and the presentation of the text of the transcription of the audio signal of the message can be substantially synchronized. In these embodiments, as well as in others, the text may allow the hearing impaired user 112 to complete the audible reading of the message on the second device 106 and to confirm their understanding of the words spoken in the message. Modifications, additions or omissions can be made to the system 100 without departing from the scope of the present invention. For example, in some embodiments, the user 110 may also be hard of hearing. In these embodiments, as well as in others, the transcription system 108 may provide text to the first device 104 based on an audio signal transmitted by the second device 106. Alternatively, or in addition, the speech system transcript 108 may include additional functions. For example, the transcription system 108 can edit the text or make other modifications to the text after the presentation of the text on the second device 106. As a variant, or in addition, in certain embodiments, the environment 100 can comprise additional devices similar to the first and second devices 104 and 106. In these embodiments, as well as in others, the similar devices can be configured to present communications as described herein. FIGS. 2A, 2B and 2C show examples of displays 220, 240 and 260 which may appear on a screen of a device associated with the presentation of a communication by the device, arranged according to at least one embodiment described in the current. The display 220 in FIG. 2A shows voice messages 222, 224, 226, 228 and 230 which can be stored locally on a device, like the second device 106 in FIG. 1 or stored externally, for example in the base data 130 of figure 1. The displays 220, 240 and 260 can be displayed by a screen such as the display screen 120 of the second device 106 of FIG. 1. The voice messages 224 and 230 can be designated by the telephone number of the caller who left the voice message, while voice messages 222, 226 and 228 are designated by the caller ID of the telephone number of the caller who left the voice message or by the name of a contact stored in the device which corresponds to the caller's telephone number. In addition, voice messages 224 and 228 may be highlighted to indicate that they have not yet been read, while voice messages 222, 226 and 230 may not be highlighted to indicate that voice messages 224 and 228 have already been read. After the selection of the voice message 226 on the display 220 of FIG. 2A, and in response to this choice, the display 220 of FIG. 2A can switch to the display 240 of FIG. 2B. For example, voice message 226 can be selected by the user who touches voice message 226 in display 220 of Figure 2A. At least parts of the 240 display may appear during the audio signal buffering. For example, during the buffering of the audio signal of the voice message 226, the display 240 may include a message 242 signaling to the hearing impaired user 112 the delay and / or a message 244 specifying to the hearing impaired user 112 l origin of the voice message 226 and / or a message 246 specifying to a hearing-impaired user the date and time at which the voice message 226 was created. After buffering the audio signal of the voice message 226, a transition from display 240 in Figure 2B to display 260 in Figure 2C may occur. The display 260 can display the text 262 of the voice message 226, as provided in real time by a transcription system, during the reading of the voice message 226. Modifications, additions or omissions can be made to the displays 220, 240 and 260 without departing from the scope of the present invention. FIG. 3 represents examples of operations 300 for presenting a communication. The operations 300 can be arranged according to at least one embodiment described herein. The operations 300 can be carried out between a processing system 314, a transcription system 308, a screen 320 and a loudspeaker 324. In certain embodiments, the processing system 314, the screen 320 and the loudspeaker 324 can be part of a device which can be analogous to the second device 106 of FIG. 1. In these embodiments, as well as in others, the processing system 314 can be a combination of hardware devices, such as processors, memory, modems, and other hardware devices that perform the functions described here. As a variant or in addition, the transcription system 308 can be analogous to the transcription system 108 of FIG. 1. Consequently, no further description will be given of it. In certain embodiments, the operations 300 can be an example of communications and interactions between the processing system 314, the transcription system 308, the screen 320 and the speaker 324. In general, the operations 300 may relate to the presentation of an audio signal of a message by the speaker 324 simultaneously to the presentation of a text of the audio signal by the screen 320. Interactions between the processing system 314 and the transcription system 308 can take place on one or more network (s). The operations 300 shown are not exhaustive and are merely representative of operations 300 that may occur. In addition, an illustrated operation can represent one or more communications, operations and / or exchanges of data. In operation 340, the screen 320 can transmit a user request to the processing system 314. In certain embodiments, the screen 320 can be a touch screen. In these embodiments, as well as in others, the screen 320 can receive input from a user. User input can be a selection of a message that was previously left for the user by another person in response to the person's attempt to contact the user. The message can be a voice message or a video message. The user request can be a request to read the message. In some embodiments, reading a message may include presenting the message to a user. The presentation of the message can comprise the presentation of the audio signal of the message by broadcasting the audio signal of the message through the loudspeaker 324. As a variant or in addition, the presentation of the message can include the presentation of the video signal of the message by displaying the video signal of the message on the screen 320. As a variant or in addition, the presentation of the message may include the presentation of the audio and video signals. In certain embodiments, the presentation of the message may include the presentation of the text which corresponds to the audio signal contained in the message. For example, the audio signal contained in the message may include spoken words. The text that corresponds to the audio signal can consist of words spoken as text. In these embodiments, as well as in others, the text may be a transcription of the words spoken in the audio signal. In some embodiments, the processing system 314 can receive a user request from an input device other than the screen 320. In operation 342, the processing system 314 can obtain the message in response to receiving the user request from operation 340. In some embodiments, the message can be stored in a memory that is part of the processing system 314. In these and other embodiments, the message can be stored locally in the device that includes processing system 314. Alternatively or in addition, the message can be stored outside of the device which includes the processing system 314. For example, the message can be stored in a database which can be accessed via a network. In these embodiments, as well as in others, the processing system 314 may include an identifier associated with the message. The processing system 314 can request the message from the storage location of the message using the identifier. In some embodiments, the storage location can be associated with the transcription system 308. In response to the request for the message, the processing system 314 can obtain the message. In operation 344, the processing system 314 can send the audio signal of the message to the transcription system 308. The processing system 314 can send the audio signal of the message to the transcription system 308 via a network, such as network 102 of Figure 1. In these embodiments, as well as in others, the processing system 314 and the transcription system 308 may not be in the same device. In some embodiments, the message may include an audio signal without video. In these embodiments, as well as in others, the entire message can be sent to the transcription system 308. Alternatively or in addition, the message can include audio and video signals. In these and other embodiments, the audio signal of the message can be sent to the transcription system 308. In some embodiments, the audio signal can be sent to the transcription system 308 by streaming the audio signal to the transcription system 308. For example, the processing system 314 can be configured to send the message over an interval of time which is roughly equal to a length of reading the message when the message is presented to a user through the speaker 324. In some embodiments, the audio signal can be sent to the transcription system 308 in sections similar to a file transfer. The audio signal can be sent in a block of time based on the network connection between the transcription system 308 and the processing system 314. In operation 346, the processing system 314 can buffer the message. The processing system 314 may begin buffering the message after the audio signal has been sent to the transcription system 308, and in response to this action. The message can be buffered using a "first in, first out" type buffer. In some embodiments, the message can be buffered after an audio stream is generated by the processing system 314 for broadcast by the speaker 324. In these embodiments, as well as in Others, the audio signal of the message can be transmitted to a codec or other processing element to generate an audio stream which can be sent to a digital analog converter (DAC) to be broadcast by the speaker 324. After the generation of the stream audio, the audio stream can be buffered before the audio stream is presented by the speaker 324. In some embodiments, the audio stream can be blended to be sent to the transcription system 308 over a network during operation 344. In these embodiments, as well as in others, the entire audio stream may not be generated completely before the audio stream is buffered. Indeed, as parts of the audio stream are generated, the generated parts can be buffered. For example, while a first part of the audio stream is being generated, the first part can be buffered. While a second part following the first is generated, the second part can be buffered. In these embodiments, as well as in others, the first part may have been buffered and sent to the speaker 324 before the second part is generated and sent to the buffer. The buffering length of the audio stream can be determined in operation 352, as explained below. When the message includes audio and video signals, the audio signal and the video signal which can be sent to speaker 324 and screen 320 respectively can be buffered before being sent to speaker 324 and on the screen 320. In these embodiments, as well as in others, the video for the message can be transmitted to a codec or other processing element to generate a video stream intended to be displayed by the screen 320. In these and other embodiments, after generation of the video stream and the audio stream, the video stream and the audio stream may be buffered. The message can be buffered in real time depending on how the message is presented to the user. Thus, the message which is put in a buffer of four seconds delays the presentation of the message by four seconds. In these and other embodiments, an entire message can be buffered so that a message that is put in a four second buffer can be presented to a user four seconds after the audio signal is first delivered by the codec and the presentation to the user can be finished four seconds after the audio signal has finished being delivered by the codec. In operation 348, the transcription system 308 can generate text based on the audio signal provided by the processing system 314. The transcription system 308 can generate this text using a transcription system. For example, the transcription system 308 can generate an audio stream of the audio signal. The audio stream can be used by the transcription system to generate the text of the audio signal. The text of the audio signal can be generated in real time by the transcription system 308. As described, operation 348 and operation 346 can occur substantially at the same time. As a result, operation 348 and operation 346 can occur at overlapping time intervals. In operation 350, the text generated by the transcription system 308 can be sent to the processing system 314. In some embodiments, the text can be transmitted as it is generated. For example, the audio signal may have a duration of twenty seconds and it may include fifty spoken words, with about two or three words spoken every second. In these embodiments, as well as in others, the transcription system 308 can send the words of the text as the words are generated. Consequently, the transcription system 308 can send the words individually or in small groups of words to the processing system 314. Thus, the transcription system 308 can continuously stream the text to the processing system 314 at approximately the same rate. than that of an audio stream generated by the processing system 314. In operation 352, the processing system 314 can determine a length for buffering the audio stream. In some embodiments, the processing system 314 can determine the buffer length based on a time difference between when the audio signal is sent to the transcription system 308 and when the text is received by the speech system. processing 314 from transcription system 308. In these and other embodiments, processing system 314 may have a buffer length when processing system 314 begins buffering the audio stream . Rather, the processing system 314 may begin to buffer the audio stream and may not deliver an audio stream until the text is transmitted by the transcription system 308. In these embodiments, as well as in Others, the transcription system 308 can then determine the buffer length and output the audio stream from the buffer while continuing to buffer the remaining portion of the audio signal for the determined buffer length. In some embodiments, the processing system 314 can determine a length for buffering the audio stream after a previous length of buffer determined for a previous message. In these embodiments, as well as in others, the previously determined buffer length can be determined from a time required to receive the text returning from the transcription system 308 after sending the audio signal to the transcription system 308 . In some embodiments, the processing system 314 can determine a length to buffer the audio stream based on a particular length of buffer. This particular buffer length can be determined in length before receiving a message. The particular buffer length can be determined based on a known average time to receive the text returning from the transcription system 308 after sending the audio signal to the transcription system 308. Alternatively or in addition, the particular buffer length can be determined by the longest or shortest time known to receive text returning from the transcription system 308 after sending the audio signal to the transcription system 308. Alternatively or in addition, the particular buffer length can be determined from other factors. In some embodiments, the particular buffer length can be between one and twenty seconds. Alternatively or additionally, the particular buffer length may be between two and ten seconds or three and five seconds, or some other time interval. In operation 354, in response to receiving the text, the processing system 314 can be configured to continuously stream the received text to the screen 320. In operation 356, the processing system 314 can be configured to send the buffered audio stream to speaker 324. In these embodiments, as well as in others, the buffered audio stream sent to speaker 324 and text sent to screen 320 can occur substantially at the same time, for example at overlapping time intervals. In these embodiments, as well as in others, the text sent and the audio stream buffered sent may match since the text sent may be a transcript of the audio stream buffered sent. For example, the text sent at the instant T1 can be a transcription of a first part of the audio stream sent at the instant T1 and the text sent at the instant T2 can be a transcription of a second part of the audio stream sent at time T2. However, without buffering the audio stream, the audio stream can arrive at the speaker 324 before the text that corresponds to the audio stream. For example, a first part of the audio stream can be sent at the instant Tl to the speaker 324 and there may not be any text sent at the instant Tl to the screen 320. Consequently, at the instant T2, which comes after the instant Tl, the text corresponding to the audio stream sent at the instant Tl can be sent to the screen 320 while a second part of the audio stream is sent to the speaker 324. In operation 358, the screen 320 can display the text transmitted by the processing system 314. In operation 360, the speaker 324 can broadcast the audio stream transmitted by the processing system 314. The text can be displayed and the audio stream can be played in such a way that the text is substantially aligned with the audio stream. Text substantially aligned with the audio stream may indicate that the words in the audio stream are being broadcast at approximately the same time as the text which is a transcription of the words is displayed. In operation 362, the processing system 314 can send a network connection control to the transcription system 308. The network connection control can request a response from the transcription system 308 to authorize the processing system 314 to control a network connection between the processing system 314 and the transcription system 308. For example, the processing system 314 can send a ping request to a network access point of the transcription system 308. In operation 364, the transcription system 308 can provide a response to the network connection control sent by the transcription system 308. For example, the transcription system 308 can ping back to the processing system 314. The Processing system 314 can be configured to determine a network delay based on a time difference between controlling the network connection and receiving the response. In operation 366, processing system 314 can be configured to adjust the buffer length based on the time difference between the network connection control and the response. In some embodiments, operations 362, 364 and 366 can occur while the processing system 314 buffers an audio stream. In these and other embodiments, the processing system 314 can determine the buffer length initially for a message and it can adjust the buffer length during the presentation of the message by the speaker 324 and screen 320. Alternatively or additionally, processing system 314 can perform operations 362, 364 and 366 after a first message has been presented and a buffer length has been determined to further adjust the length buffer. For example, the processing system 314 can perform operations 362, 364 and 366 in response to receiving a request to present a second message after the presentation of the first message. Modifications, additions or omissions can be made to operations 300 without departing from the scope of the present invention. For example, in some embodiments, operations 300 may not include one or more of the operations. For example, operations 300 may not include operations 362, 364 and 366. In some embodiments, operations 300 may include additional operations. As another example, in some embodiments, operations 300 may be arranged in a different order. For example, operation 354 can occur before operation 356. FIG. 4 shows an example of a system 400 which can be used to automatically delay the reading of a voice message in a subtitling device. The system 400 can be arranged according to at least one embodiment described herein. The system 400 may include a processor 410, a memory 412, a file system 414, a communication system 416, a screen 418, a user interface 420 and peripherals 422, all of which can be communicatively coupled. In some embodiments, the system 400 may be part of any of the systems or devices described herein. For example, the system 400 can be part of any of the first device 104, second device 106 or of the transcription system 108 of FIG. 1. Generally, processor 410 can include any specialized or versatile computer, a computing entity, or a processing device comprising various hardware or software computing modules, and it can be configured to execute instructions stored on any applicable storage medium. readable by computer. For example, processor 410 may include a microprocessor, microcontroller, digital signal processor (DSP), client integrated circuit (ASIC), programmable pre-broadcast integrated circuit (FPGA), or any other digital or analog circuit configured to interpret and / or execute program instructions and / or to process data, or any combination thereof. Although shown as a single processor in Figure 4, it is understood that the processor 410 can include any number of processors distributed over any number of networks or physical locations that are configured to perform individually or collectively any operation described herein. In some embodiments, processor 410 can interpret and / or execute program instructions and / or process data stored in memory 412, file system 414, or memory 412 and file system 414. In some In embodiments, processor 410 can extract program instructions from the file system 414 and load the program instructions into memory 412. Once the program instructions have been loaded into memory 412, the processor 410 can execute the program instructions. For example, the system 400 can be part of the first device 104, the second device 106 or the transcription system 108 of FIG. 1. In these embodiments, as well as in others, the instructions can include the processor 410 putting buffer a message until the text corresponding to the audio signal of the message is obtained for presentation and presenting the message and the text simultaneously. The memory 412 and the file system 414 may include computer readable storage media for carrying or carrying computer executable instructions or data structures stored thereon. Such computer readable storage media can be any available media that can be read by a general purpose computer or a specialized computer, such as processor 410. By way of example, and not limitation, these readable storage media by computer may include non-transient computer readable storage media such as read only memory (ROM), electrically erasable and programmable read only memory (EEPROM), CD-ROM or other optical disc storage device, storage by magnetic disk or other magnetic storage devices, flash memory devices (eg semiconductor memory devices), or any other storage medium which can be used to carry or store a desired program code in the form of computer instructions or data structures and which can be read by a general purpose computer or by a computer specialized. It is also possible to include in the scope of computer-readable storage media combinations of the elements mentioned above. The computer instructions can include, for example, instructions and data configured to cause the processor 410 to execute a certain operation or a certain group of operations, such as one or more operation (s) of operations 300 and / or one or more boxes. (s) of processes 500 and 600. The communication system 416 can include any component, device, system or combination thereof that is configured to transmit or receive information over a network. In some embodiments, the communication unit 416 can communicate with other devices located in other places, in the same place, or even with other components of the same system. For example, the communication system 416 may include a modem, a network card (wireless or wired), an infrared communication device, a wireless communication device (such as an antenna), and / or a chipset ( such as a Bluetooth device, an 802.6 device (for example a metropolitan area network (MAN)), a Wi-Fi device, a WiMAX device, cellular communication equipment, etc.), a conventional telephone service and / or the like. The communication system 416 can allow the exchange of data with a network and / or any other device or system described herein. The 418 display can be configured as one or more displays, such as a liquid crystal display, a light emitting diode display, or some other type of display. Screen 418 can be configured to present video, text, user interfaces, and other data under the control of processor 410. The user interface 420 can include any device making it possible to interface between a user and the system 400. For example, the user interface 420 can include a mouse, a touchpad, a keyboard, a touch screen, a telephone hook switch, volume controls, other buttons, and / or a telephone keypad, among others. The user interface 420 can receive an input from a user and transmit this input to the processor 410. In certain embodiments, the user interface 420 and the screen 418 can be combined in a single device, such that the screen 418 can display information and can be an interface that allows a user to interface with system 400. Peripherals 422 can include one or more devices. For example, the peripherals may include a microphone, an imager and / or a speaker, among others. In these embodiments, as well as in others, the microphone can be configured to capture audio signals. The imager can be configured to capture digital images. Digital images can be captured to produce video or image data. In some embodiments, the speaker may broadcast an audio signal received by the system 400 or otherwise generated by the system 400. Modifications, additions or omissions can be made to the 400 system without departing from the scope of the present invention. Figure 5 is a flow diagram of an example method 500 for presenting a communication. The method 500 can be arranged according to at least one embodiment described herein. The method 500 can be implemented, in certain embodiments, by a device or system, such as the first device 104 and / or the second device 106, or another device. In these embodiments, as well as in others, the method 500 can be implemented based on the execution of instructions stored on one or more non-transient computer-readable medium (s). ). Although represented in discrete boxes, different boxes can be divided into additional boxes, combined to give fewer boxes, or eliminated, depending on the desired implementation. The method 500 can start in box 502, where a request can be obtained in a device, in which a user requests the reading of a stored message which includes an audio signal. In some embodiments, the message can be stored in the device. As a variant or in addition, the message can be stored outside the device. In these embodiments, as well as in others, the method may further include obtaining, in the device, the message via a network. In these embodiments, as well as in others, the audio signal of the message can be directed to the transcription system after being obtained by the device. In box 504, in response to obtaining the request, the audio signal of the message can be directed to a transcription system. The transcription system can be configured to generate text which is a transcription of the audio signal in real time. In box 506, in response to obtaining the request, an audio stream can be generated for presentation by the device. In box 508, the audio stream can be buffered before the audio stream is presented by the device. In box 510, the text generated by the transcription system can be obtained by the device. In box 512, in response to obtaining text from the transcription system, the buffered audio stream and the text may be presented by the device such that the presented text substantially aligns with the audio stream buffered presented. It is understood that, for these processes, operations and procedures, as well as others, the functions and / or operations carried out can be implemented in a different order. In addition, the highlighted functions and operations are given only as examples, and some of the functions and operations may be optional, combined into a smaller number of functions and operations, or developed into additional functions and operations without move away from the essence of the embodiments described. For example, in some embodiments, the method may further include determining a buffered length based on a time difference between transmitting the audio signal to the transcription system and obtaining text from the speech system. transcription. In these embodiments, as well as in others, the audio signal can be buffered continuously based on the determined buffered length. Figure 6 is a flow diagram of another example method 600 for presenting a communication. The method 600 can be arranged according to at least one embodiment described herein. The method 600 can be implemented, in certain embodiments, by a device or system, such as the first device 104 and / or the second device 106, or another device. In these embodiments, as well as in others, the method 600 can be implemented based on the execution of instructions stored on one or more non-transient computer-readable medium (s). ). Although represented in discrete boxes, different boxes can be divided into additional boxes, combined to give fewer boxes, or eliminated, depending on the desired implementation. The method 600 can start in box 602, where a request can be obtained in a device, in which a user requests the reading of a stored message which includes an audio signal. In some embodiments, the message can be stored in the device. As a variant or in addition, the message can be stored outside the device. In these embodiments, as well as in others, the method may further include obtaining, in the device, the message via a network. In box 604, in response to obtaining the request, the audio signal of the message can be directed to a transcription system from the device. The transcription system can be configured to generate text which is a transcription of the audio signal in real time. In box 606, text from the transcription system can be obtained in the device. In box 608, the text generated by the real-time transcription system can be presented by the device. In box 610, in response to obtaining text from the transcription system, the audio signal may be presented by the device, such that the text presented is substantially aligned with the audio signal. It is understood that, for these processes, operations and procedures, as well as others, the functions and / or operations carried out can be implemented in a different order. In addition, the highlighted functions and operations are given only as examples, and some of the functions and operations may be optional, combined into a smaller number of functions and operations, or developed into additional functions and operations without move away from the essence of the embodiments described. For example, in some embodiments, the message may further include a video signal. In these and other embodiments, the method may further include, in response to obtaining the text from the transcription system, the presentation by the device of the video signal in such a way as the text. is substantially aligned with audio and video signals. As another example, the method may further include buffering the audio signal in the device while the audio signal is directed to the transcription system, so that the audio signal directed to the transcription system is stored in memory buffer in the device. In these embodiments, as well as in others, the audio signal presented by the device can be the audio signal buffered so that the text is substantially aligned with the audio signal presented by the device. In these and other embodiments, the method may further include determining a buffer length based on a time difference between transmitting the audio signal to the transcription system and obtaining the text from of the transcription system. In these embodiments, as well as in others, the audio signal can be buffered continuously based on the determined buffer length. Alternatively or in addition, the method may further comprise, after determining the buffer length, adjusting the buffer length based on a network connection between the transcription system and the device. Operations 300 and methods 500 and 600 can improve the operation of a device. For example, the operation of the second device 106 in FIG. 1 can be improved itself by operations 300 and methods 500 and 600, at least because the second device 106 can be configured to better align the presentation of the audio signal and the text. obtained from the audio signal by a third party. Furthermore, operations 300 and processes 500 and 600 can improve the technical field of telecommunications. For example, the technical field of telecommunications can be improved by operations 300 and methods 500 and 600 at least because certain known devices do not correctly align the audio signal and the text obtained from the audio signal. As indicated above, the embodiments described here may include the use of a specialized computer or a general-purpose computer (eg the processor 410 of FIG. 4) comprising various computer hardware or software modules, as described in more detail below. In addition, as noted above, the embodiments described herein can be implemented using computer readable media (e.g., memory 412 or the file system 414 of Figure 4) intended to carry or carry computer-executable instructions or data structures stored thereon. In some embodiments, the various components, modules, engines, and services described herein can be implemented as objects or processes that run on a computer system (for example, as separate threads). While some of the systems and methods described herein are generally described as being implemented in software (stored on and / or executed by versatile hardware), specific hardware implementations or a combination of software implementations and Specific materials are also possible and envisaged. According to current practice, the various elements illustrated in the drawings may not be drawn to scale. The illustrations presented herein are not intended to be actual views of a particular device (e.g. device, system, etc.) or process, rather, they are simple idealized representations that are used to describe various embodiments of the invention. Therefore, the dimensions of the various elements can be increased or reduced arbitrarily for the sake of clarity. In addition, some drawings can be simplified for the sake of clarity. The drawings may therefore not represent all the components of a given device (eg device) or all the operations of a particular process. The terms used here and in particular in the appended claims (eg the body of the appended claims) are generally intended to be "open" terms (eg the term "comprising" should be interpreted as "comprising, but not limited to ", the term" comprising "must be interpreted as" comprising at least ", the term" includes "must be interpreted as" includes, but is not limited to ", etc.). Furthermore, if a specific number of the text of an introduced claim is desired, this will is expressly expressed in the claim, and in the absence of such text, there is no such wish. For example, for ease of understanding, the appended claims which follow may contain the use of the introductory phrases "at least one" and "one or more" to introduce claims texts. However, the use of these expressions should not be considered as implying that the introduction of a claim text by the indefinite articles "one" or "one" limits a specific claim containing such a claim text introduced to realization containing only this text, even when the same claim includes the introductory phrases "one or more" or "at least one" and undefined articles such as "one" or "one" (eg "one" and / or "one" should be interpreted as meaning "at least one" or "one or more"); this remains true for the use of definite articles used to introduce claims texts. In addition, even if a specific number of the text of a claim is expressly expressed, it is understood that such text must be interpreted to mean at least the number expressed (eg the simple expression of "two recitations" , without any other modifier, means at least two recitations, or two or more recitations). In addition, in the case where a convention similar to "at least one element among A, B and C, etc. "Or" one or more element (s) among A, B and C, etc. Is used, in general such a construction is intended to include A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together, etc. For example, the use of the term "and / or" is intended to be viewed in this way. Furthermore, any disjunctive word or expression having two or more alternative terms, whether in the description, in the claims or in the drawings, must be understood as providing for the possibilities of including one of the terms, l either or both of the terms. For example, the expression "A or B" should be understood to include the possibilities "A", "B" or "A and B". In addition, the terms "first", "second", "third", etc. are not necessarily used here to denote a specific order or a specific number of elements. Generally, the terms "first", "second", "third", etc. are used to distinguish between different elements as generic identifiers. In the absence of any element showing that the terms "first", "second", "third", etc. denote a particular order, these terms should not be taken as designating a particular order. In addition, in the absence of evidence showing that the terms "first", "second", "third", etc. denote a particular number of elements, these terms should not be taken as designating a particular number of elements. For example, a first sticker can be described as having a first side and a second sticker can be described as having a second side. The use of the expression "second side" in relation to the second sticker may be intended to distinguish this side of the second sticker from the "first side" of the first sticker and not to express that the second sticker has two sides. All the examples and language conditions expressed here are intended for educational purposes to help the reader understand the invention and the concepts brought by the inventor to improve the technique, and should be considered as being without limitation to these examples and conditions listed in particular. Although embodiments of the present invention have been described in detail, it is understood that the various modifications, substitutions and alterations can be made without departing from the spirit and scope of the present invention.
权利要求:
Claims (20) [1" id="c-fr-0001] 1. Subtitling device, comprising: a screen ; a speaker ; a processor communicatively coupled to the screen and the speaker; and at least one computer-readable medium communicatingly connected to the processor and configured to store one or more instructions which, when executed by the processor, cause operations by the subtitling device to include: obtaining, in the captioning device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal to a transcription system from the captioning device, the transcription system being configured to generate text which is a transcription of the audio signal in real time; in response to obtaining the request, generating an audio stream to be sent to the speaker; buffering the audio stream before sending the audio stream to the speaker; obtaining, in the subtitling system, the text generated by the transcription system; the presentation, on the screen, of text from the real-time transcription system; and in response to obtaining text from the transcription system, transmitting the buffered audio stream to the speaker for presentation by the speaker such that the text is substantially aligned with the stream audio presented by the speaker. [2" id="c-fr-0002] 2. Subtitling device according to claim 1, in which the message is stored in the computer-readable medium (s) of the subtitling device. [3" id="c-fr-0003] The captioning device according to claim 1, wherein the message is stored outside the captioning device and the operations further include obtaining, in the captioning device, the message via a network, in which the audio signal of the message is sent to the transcription system after being obtained by the subtitling device. [4" id="c-fr-0004] The captioning device according to claim 1, wherein the message further comprises a video signal and the operations further comprise, in response to obtaining text from the transcription system, the presentation on the screen , of the video signal so that the text is substantially aligned with the audio and video signals. [5" id="c-fr-0005] 5. Subtitling device according to claim 1, in which the operations further comprise, during an interval between obtaining the request from the user and presenting the audio signal, the presentation, on the screen, a message alerting the user of a delay in the presentation of the audio signal and the text. [6" id="c-fr-0006] The captioning device according to claim 1, wherein the operations further include determining a buffer length based on a time difference between transmitting the audio signal to the transcription system and obtaining the text from it. of the transcription system, the audio stream being continuously buffered based on the determined buffer length. [7" id="c-fr-0007] The captioning device according to claim 6, wherein the operations further include, after determining the buffer length, adjusting the buffer length based on a network connection between the transcription system and the captioning device. [8" id="c-fr-0008] 8. Subtitling device according to claim 6, in which the message is a first message, the operations also include: obtaining, in the captioning device, a second request from a user to read a second stored message comprising a second audio signal; in response to obtaining the second request, directing the second audio signal to a transcription system from the captioning device, the transcription system being configured to generate a second text which is a transcription of the second audio signal in real time ; buffering a second audio stream based on the second audio signal for the buffered length; and transmitting the second buffered audio stream to the speaker for presentation by the speaker regardless of the second text provided by the transcription system. [9" id="c-fr-0009] 9. A method for presenting communications, the method comprising: obtaining, in a device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal of the message to a transcription system from the device, the transcription system being configured to generate text which is a transcription of the audio signal in real time; obtaining, in the device, the text of the transcription system; the presentation, by the device, of the text generated by the transcription system in real time; and in response to obtaining text from the transcription system, the presentation by the device of the audio signal such that the presented text is substantially aligned with the audio signal. [10" id="c-fr-0010] 10. The method of claim 9, wherein the message is stored in the device. [11" id="c-fr-0011] The method of claim 9, wherein the message is stored outside the device and the method further comprises obtaining, in the device, the message via a network, wherein the audio signal of the message is directed to the transcription system after being obtained by the device. [12" id="c-fr-0012] The method of claim 9, wherein the message further comprises a video signal and the method further comprises, in response to obtaining text from the transcription system, the presentation, by the device, of the video signal of in such a way that the text is substantially aligned with the audio and video signals. [13" id="c-fr-0013] 13. The method of claim 9, further comprising: the generation of an audio stream intended to be presented by the device; buffering the audio stream before the audio stream is presented by the device; wherein the audio signal presented by the device is the buffered audio stream such that the text is substantially aligned with the buffered audio stream presented by the device. [14" id="c-fr-0014] 14. The method of claim 13, further comprising determining a buffer length based on a time difference between transmitting the audio signal to the transcription system and obtaining text from the transcription system, wherein the audio stream is continuously buffered based on the determined buffer length. [15" id="c-fr-0015] 15. The method of claim 14, further comprising, after determining the buffer length, adjusting the buffer length based on a network connection between the transcription system and the device. [16" id="c-fr-0016] 16. One or more computer-readable medium (s) configured to store one or more instructions which, when executed by one or more processor (s), cause the device to execute the The method of claim 9. [17" id="c-fr-0017] 17. Method for presenting a communication, the method comprising: obtaining, in a device, a request from a user to read a stored message that includes an audio signal; in response to obtaining the request, directing the audio signal of the message to a transcription system, the transcription system being configured to generate text which is a transcription of the audio signal in real time; in response to obtaining the request, generating an audio stream to be presented by the device; buffering the audio stream before the audio stream is presented by the device; obtaining, in the device, text generated by the transcription system; and in response to obtaining text from the transcription system, the presentation, by the device, of the buffered audio stream and the text such that the presented text substantially aligns with the stored audio stream stamp shown. [18" id="c-fr-0018] 18. The method of claim 17, further comprising determining a buffer length based on a time difference between transmitting the audio signal to the transcription system and obtaining text from the transcription system, wherein the audio signal is continuously buffered based on the determined buffer length. [19" id="c-fr-0019] 19. The method of claim 17, wherein the message is stored in the device. [20" id="c-fr-0020] 20. Device comprising at least one computer-readable medium configured to store one or more instructions which, when executed by one or more processor (s), cause the device to execute the method of claim 17. 1/8
类似技术:
公开号 | 公开日 | 专利标题 FR3071689A1|2019-03-29|PRESENTATION OF COMMUNICATIONS US9485748B2|2016-11-01|Controlling delivery of notifications in real-time communications based on communication channel state US8737581B1|2014-05-27|Pausing a live teleconference call US20170359393A1|2017-12-14|System and Method for Building Contextual Highlights for Conferencing Systems US8385527B2|2013-02-26|Method and apparatus for overlaying whispered audio onto a telephone call US10257350B2|2019-04-09|Playing back portions of a recorded conversation based on keywords FR3063593A1|2018-09-07|MANAGEMENT OF COMMUNICATION SESSIONS RELATING TO MULTIPLE TRANSPORT MEDIA FR3063595A1|2018-09-07|AUTOMATIC DELAY OF READING A MESSAGE ON A DEVICE EP2575341B1|2014-07-16|Method and device for consulting a voice message CN110989889A|2020-04-10|Information display method, information display device and electronic equipment FR3066063A1|2018-11-09|COMMUNICATION ON THE WEB IN REAL TIME FROM AN AUDIOVISUAL FILE US11234088B2|2022-01-25|Centrally controlling communication at a venue US10818295B1|2020-10-27|Maintaining network connections US8917833B1|2014-12-23|System and method for non-privacy invasive conversation information recording implemented in a mobile phone device US20200184973A1|2020-06-11|Transcription of communications US11218593B2|2022-01-04|Method and system for communication EP3643057B1|2021-03-24|Method for establishing a communication with an interactive server JP2019176375A|2019-10-10|Moving image output apparatus, moving image output method, and moving image output program FR3069124A1|2019-01-18|TELEPHONE RECEIVER IDENTIFICATION FR2961919A1|2011-12-30|Method for processing 360 degree audiovisual scene of football match broadcasted in Internet by server, involves transmitting information relative to scene portion visualized by user to mobile telephone for utilization EP2999203A1|2016-03-23|Conferencing system FR2852438A1|2004-09-17|Voice messages translating system for use in multi-lingual audio-conference, has temporizing unit to temporize messages such that terminals except terminal which transmits current message, receive endings in speakers language
同族专利:
公开号 | 公开日 WO2019060113A1|2019-03-28| US20200251126A1|2020-08-06| US10224057B1|2019-03-05| US20190096423A1|2019-03-28|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 AT281736T|1992-09-11|2004-11-15|Siemens Inf & Comm Networks|TECHNIQUES FOR GENERATING PERSONNEL COMPUTER DISPLAYS AND PRINTED TRANSCRIPTS FROM VOICE MAIL MESSAGES| US7013273B2|2001-03-29|2006-03-14|Matsushita Electric Industrial Co., Ltd.|Speech recognition based captioning system| US8045054B2|2006-09-13|2011-10-25|Nortel Networks Limited|Closed captioning language translation| US7822050B2|2007-01-09|2010-10-26|Cisco Technology, Inc.|Buffering, pausing and condensing a live phone call| US20080284910A1|2007-01-31|2008-11-20|John Erskine|Text data for streaming video| US20080273675A1|2007-05-03|2008-11-06|James Siminoff|Systems And Methods For Displaying Voicemail Transcriptions| US8284909B2|2008-09-29|2012-10-09|Microsoft Corporation|Offline voicemail| US8843368B2|2009-08-17|2014-09-23|At&T Intellectual Property I, L.P.|Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment| US20110045804A1|2009-08-18|2011-02-24|Self Michael R|System and Method for Providing an Audio Voicemail and its Textual Transcription in a Mobile Device| US8879695B2|2010-08-06|2014-11-04|At&T Intellectual Property I, L.P.|System and method for selective voicemail transcription| JP5727777B2|2010-12-17|2015-06-03|株式会社東芝|Conference support apparatus and conference support method| US8923270B2|2011-10-04|2014-12-30|The Chinese University Of Hong Kong|Method for link buffer size and queue length estimation for bandwidth-varying mobile data networks| US8699677B2|2012-01-09|2014-04-15|Comcast Cable Communications, Llc|Voice transcription| GB2503922A|2012-07-12|2014-01-15|Metaswitch Networks Ltd|A transcription device configured to convert speech into text data in response to a transcription request from a receiving party| US9477657B2|2014-06-11|2016-10-25|Verizon Patent And Licensing Inc.|Real time multi-language voice translation| US9628603B2|2014-07-23|2017-04-18|Lenovo Pte. Ltd.|Voice mail transcription| US10582268B2|2015-04-03|2020-03-03|Philip T. McLaughlin|System and method for synchronization of audio and closed captioning| US20170085506A1|2015-09-21|2017-03-23|Beam Propulsion Lab Inc.|System and method of bidirectional transcripts for voice/text messaging| US9374536B1|2015-11-12|2016-06-21|Captioncall, Llc|Video captioning communication system, devices and related methods for captioning during a real-time video communication session|US10558761B2|2018-07-05|2020-02-11|Disney Enterprises, Inc.|Alignment of video and textual sequences for metadata analysis| US11017778B1|2018-12-04|2021-05-25|Sorenson Ip Holdings, Llc|Switching between speech recognition systems| US10573312B1|2018-12-04|2020-02-25|Sorenson Ip Holdings, Llc|Transcription generation from multiple speech recognition systems| US11264029B2|2019-01-05|2022-03-01|Starkey Laboratories, Inc.|Local artificial intelligence assistant system with ear-wearable device| US11264035B2|2019-01-05|2022-03-01|Starkey Laboratories, Inc.|Audio signal processing for automatic transcription using ear-wearable device| US10818295B1|2019-07-03|2020-10-27|Sorenson Ip Holdings, Llc|Maintaining network connections|
法律状态:
2019-01-23| PLFP| Fee payment|Year of fee payment: 2 | 2020-11-06| ST| Notification of lapse|Effective date: 20201006 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 US15/714,526|US10224057B1|2017-09-25|2017-09-25|Presentation of communications| US15714526|2017-09-25| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|